Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 685
Filtrar
1.
Epidemics ; 39: 100576, 2022 06.
Artículo en Inglés | MEDLINE | ID: mdl-35605437

RESUMEN

The SARS-CoV-2 pandemic led to a huge increase in global pathogen genome sequencing efforts, and the resulting data are becoming increasingly important to detect variants of concern, monitor outbreaks, and quantify transmission dynamics. However, this rapid up-scaling in data generation brought with it many IT infrastructure challenges. In this paper, we report about developing an improved system for genomic epidemiology. We (i) highlight key challenges that were exacerbated by the pandemic situation, (ii) provide data infrastructure design principles to address them, and (iii) give an implementation example developed by the Swiss SARS-CoV-2 Sequencing Consortium (S3C) in response to the COVID-19 pandemic. Finally, we discuss remaining challenges to data infrastructure for genomic epidemiology. Improving these infrastructures will help better detect, monitor, and respond to future public health threats.


Asunto(s)
COVID-19 , Biología Computacional/estadística & datos numéricos , Genómica , Pandemias , SARS-CoV-2/genética , COVID-19/epidemiología , Biología Computacional/tendencias , Humanos , Datos de Secuencia Molecular , Suiza/epidemiología
2.
Comput Math Methods Med ; 2022: 8691646, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35126641

RESUMEN

Task scheduling in parallel multiple sequence alignment (MSA) through improved dynamic programming optimization speeds up alignment processing. The increased importance of multiple matching sequences also needs the utilization of parallel processor systems. This dynamic algorithm proposes improved task scheduling in case of parallel MSA. Specifically, the alignment of several tertiary structured proteins is computationally complex than simple word-based MSA. Parallel task processing is computationally more efficient for protein-structured based superposition. The basic condition for the application of dynamic programming is also fulfilled, because the task scheduling problem has multiple possible solutions or options. Search space reduction for speedy processing of this algorithm is carried out through greedy strategy. Performance in terms of better results is ensured through computationally expensive recursive and iterative greedy approaches. Any optimal scheduling schemes show better performance in heterogeneous resources using CPU or GPU.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Alineación de Secuencia/métodos , Biología Computacional/estadística & datos numéricos , Humanos , Alineación de Secuencia/estadística & datos numéricos , Programas Informáticos
3.
J Diabetes Res ; 2021: 7619610, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34917686

RESUMEN

Fibroblasts are the essential cell type of skin, highly involved in the wound regeneration process. In this study, we sought to screen out the novel genes which act important roles in diabetic fibroblasts through bioinformatic methods. A total of 811 and 490 differentially expressed genes (DEGs) between diabetic and normal fibroblasts were screened out in GSE49566 and GSE78891, respectively. Furthermore, the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways involved in type 2 diabetes were retrieved from miRWalk. Consequently, the integrated bioinformatic analyses revealed the shared KEGG pathways between DEG-identified and diabetes-related pathways were functionally enriched in the MAPK signaling pathway, and the MAPKAPK3, HSPA2, TGFBR1, and p53 signaling pathways were involved. Finally, ETV4 and NPE2 were identified as the targeted transcript factors of MAPKAPK3, HSPA2, and TGFBR1. Our findings may throw novel sight in elucidating the molecular mechanisms of fibroblast pathologies in patients with diabetic wounds and targeting new factors to advance diabetic wound treatment in clinic.


Asunto(s)
Diabetes Mellitus/fisiopatología , Fibroblastos/metabolismo , Heridas y Lesiones/genética , Biología Computacional/métodos , Biología Computacional/estadística & datos numéricos , Complicaciones de la Diabetes/complicaciones , Complicaciones de la Diabetes/diagnóstico , Diabetes Mellitus/genética , Humanos , Piel/fisiopatología , Encuestas y Cuestionarios , Heridas y Lesiones/fisiopatología
4.
PLoS Comput Biol ; 17(11): e1009481, 2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-34762641

RESUMEN

Functional, usable, and maintainable open-source software is increasingly essential to scientific research, but there is a large variation in formal training for software development and maintainability. Here, we propose 10 "rules" centered on 2 best practice components: clean code and testing. These 2 areas are relatively straightforward and provide substantial utility relative to the learning investment. Adopting clean code practices helps to standardize and organize software code in order to enhance readability and reduce cognitive load for both the initial developer and subsequent contributors; this allows developers to concentrate on core functionality and reduce errors. Clean coding styles make software code more amenable to testing, including unit tests that work best with modular and consistent software code. Unit tests interrogate specific and isolated coding behavior to reduce coding errors and ensure intended functionality, especially as code increases in complexity; unit tests also implicitly provide example usages of code. Other forms of testing are geared to discover erroneous behavior arising from unexpected inputs or emerging from the interaction of complex codebases. Although conforming to coding styles and designing tests can add time to the software development project in the short term, these foundational tools can help to improve the correctness, quality, usability, and maintainability of open-source scientific software code. They also advance the principal point of scientific research: producing accurate results in a reproducible way. In addition to suggesting several tips for getting started with clean code and testing practices, we recommend numerous tools for the popular open-source scientific software languages Python, R, and Julia.


Asunto(s)
Biología Computacional/estadística & datos numéricos , Diseño de Software , Programas Informáticos , Lenguajes de Programación , Análisis de Regresión
5.
PLoS Comput Biol ; 17(11): e1009161, 2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-34762640

RESUMEN

Network propagation refers to a class of algorithms that integrate information from input data across connected nodes in a given network. These algorithms have wide applications in systems biology, protein function prediction, inferring condition-specifically altered sub-networks, and prioritizing disease genes. Despite the popularity of network propagation, there is a lack of comparative analyses of different algorithms on real data and little guidance on how to select and parameterize the various algorithms. Here, we address this problem by analyzing different combinations of network normalization and propagation methods and by demonstrating schemes for the identification of optimal parameter settings on real proteome and transcriptome data. Our work highlights the risk of a 'topology bias' caused by the incorrect use of network normalization approaches. Capitalizing on the fact that network propagation is a regularization approach, we show that minimizing the bias-variance tradeoff can be utilized for selecting optimal parameters. The application to real multi-omics data demonstrated that optimal parameters could also be obtained by either maximizing the agreement between different omics layers (e.g. proteome and transcriptome) or by maximizing the consistency between biological replicates. Furthermore, we exemplified the utility and robustness of network propagation on multi-omics datasets for identifying ageing-associated genes in brain and liver tissues of rats and for elucidating molecular mechanisms underlying prostate cancer progression. Overall, this work compares different network propagation approaches and it presents strategies for how to use network propagation algorithms to optimally address a specific research question at hand.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Envejecimiento/genética , Envejecimiento/metabolismo , Animales , Sesgo , Encéfalo/metabolismo , Biología Computacional/estadística & datos numéricos , Interpretación Estadística de Datos , Progresión de la Enfermedad , Perfilación de la Expresión Génica/estadística & datos numéricos , Redes Reguladoras de Genes , Genómica/estadística & datos numéricos , Humanos , Hígado/metabolismo , Masculino , Neoplasias de la Próstata/etiología , Neoplasias de la Próstata/genética , Neoplasias de la Próstata/metabolismo , Mapas de Interacción de Proteínas , Proteómica/estadística & datos numéricos , ARN Mensajero/genética , ARN Mensajero/metabolismo , Ratas , Biología de Sistemas
6.
PLoS Comput Biol ; 17(11): e1009477, 2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-34793435

RESUMEN

Over the past decade, biomarker discovery has become a key goal in psychiatry to aid in the more reliable diagnosis and prognosis of heterogeneous psychiatric conditions and the development of tailored therapies. Nevertheless, the prevailing statistical approach is still the mean group comparison between "cases" and "controls," which tends to ignore within-group variability. In this educational article, we used empirical data simulations to investigate how effect size, sample size, and the shape of distributions impact the interpretation of mean group differences for biomarker discovery. We then applied these statistical criteria to evaluate biomarker discovery in one area of psychiatric research-autism research. Across the most influential areas of autism research, effect size estimates ranged from small (d = 0.21, anatomical structure) to medium (d = 0.36 electrophysiology, d = 0.5, eye-tracking) to large (d = 1.1 theory of mind). We show that in normal distributions, this translates to approximately 45% to 63% of cases performing within 1 standard deviation (SD) of the typical range, i.e., they do not have a deficit/atypicality in a statistical sense. For a measure to have diagnostic utility as defined by 80% sensitivity and 80% specificity, Cohen's d of 1.66 is required, with still 40% of cases falling within 1 SD. However, in both normal and nonnormal distributions, 1 (skewness) or 2 (platykurtic, bimodal) biologically plausible subgroups may exist despite small or even nonsignificant mean group differences. This conclusion drastically contrasts the way mean group differences are frequently reported. Over 95% of studies omitted the "on average" when summarising their findings in their abstracts ("autistic people have deficits in X"), which can be misleading as it implies that the group-level difference applies to all individuals in that group. We outline practical approaches and steps for researchers to explore mean group comparisons for the discovery of stratification biomarkers.


Asunto(s)
Biomarcadores/análisis , Biología Computacional/educación , Trastorno Autístico/diagnóstico , Estudios de Casos y Controles , Biología Computacional/estadística & datos numéricos , Simulación por Computador , Humanos , Individualidad , Trastornos Mentales/diagnóstico , Trastornos del Neurodesarrollo/diagnóstico , Neuropsiquiatría/estadística & datos numéricos , Neuropsicología/estadística & datos numéricos , Distribución Normal , Tamaño de la Muestra
7.
Medicine (Baltimore) ; 100(37): e27257, 2021 Sep 17.
Artículo en Inglés | MEDLINE | ID: mdl-34664875

RESUMEN

ABSTRACT: Nasopharyngeal carcinoma (NPC) is one of the most prevalent head and neck cancer in southeast Asia. It is necessary to proceed further studies on the mechanism of occurrence and development of NPC.In this study, we employed the microarray dataset GSE12452 and GSE53819 including 28 normal samples and 49 nasopharyngeal carcinoma samples downloaded from the Gene Expression Omnibus(GEO) to analysis. R software, STRING, CMap, and various databases were used to screen differentially expressed genes (DEGs), construct the protein-protein interaction (PPI) network, and proceed small molecule compounds analysis, among others.Totally, 424 DEGs were selected from the dataset. DEGs were mainly enriched in extracellular matrix organization, cilium organization, PI3K-Akt signaling pathway, collagen-containing extracellular matrix, and extracellular matrix-receptor interaction, among others. Top 10 upregulated and top 10 downregulated hub genes were identified as hub DEGs. Piperlongumine, apigenin, menadione, 1,4-chrysenequinone, and chrysin were identified as potential drugs to prevent and treat NPC. Besides, the effect of genes CDK1, CDC45, RSPH4A, and ZMYND10 on survival of NPC was validated in GEPIA database.The data revealed novel aberrantly expressed genes and pathways in NPC by bioinformatics analysis, potentially providing novel insights for the molecular mechanisms governing NPC progression. Although further studies needed, the results demonstrated that the expression levels of CDK1, CDC45, RSPH4A, and ZMYND10 probably affected survival of NPC patients.


Asunto(s)
Biología Computacional/métodos , Neoplasias Nasofaríngeas/genética , Bibliometría , Biología Computacional/estadística & datos numéricos , Perfilación de la Expresión Génica/instrumentación , Perfilación de la Expresión Génica/métodos , Humanos , Neoplasias Nasofaríngeas/patología
8.
PLoS Comput Biol ; 17(9): e1008991, 2021 09.
Artículo en Inglés | MEDLINE | ID: mdl-34570758

RESUMEN

Identification of biopolymer motifs represents a key step in the analysis of biological sequences. The MEME Suite is a widely used toolkit for comprehensive analysis of biopolymer motifs; however, these tools are poorly integrated within popular analysis frameworks like the R/Bioconductor project, creating barriers to their use. Here we present memes, an R package that provides a seamless R interface to a selection of popular MEME Suite tools. memes provides a novel "data aware" interface to these tools, enabling rapid and complex discriminative motif analysis workflows. In addition to interfacing with popular MEME Suite tools, memes leverages existing R/Bioconductor data structures to store the multidimensional data returned by MEME Suite tools for rapid data access and manipulation. Finally, memes provides data visualization capabilities to facilitate communication of results. memes is available as a Bioconductor package at https://bioconductor.org/packages/memes, and the source code can be found at github.com/snystrom/memes.


Asunto(s)
Secuencias de Aminoácidos , Biología Computacional/métodos , Motivos de Nucleótidos , Programas Informáticos , Animales , Secuenciación de Inmunoprecipitación de Cromatina/estadística & datos numéricos , Biología Computacional/estadística & datos numéricos , Interpretación Estadística de Datos , Humanos
9.
J Biomed Semantics ; 12(1): 15, 2021 08 09.
Artículo en Inglés | MEDLINE | ID: mdl-34372934

RESUMEN

BACKGROUND: The ontology authoring step in ontology development involves having to make choices about what subject domain knowledge to include. This may concern sorting out ontological differences and making choices between conflicting axioms due to limitations in the logic or the subject domain semantics. Examples are dealing with different foundational ontologies in ontology alignment and OWL 2 DL's transitive object property versus a qualified cardinality constraint. Such conflicts have to be resolved somehow. However, only isolated and fragmented guidance for doing so is available, which therefore results in ad hoc decision-making that may not be the best choice or forgotten about later. RESULTS: This work aims to address this by taking steps towards a framework to deal with the various types of modeling conflicts through meaning negotiation and conflict resolution in a systematic way. It proposes an initial library of common conflicts, a conflict set, typical steps toward resolution, and the software availability and requirements needed for it. The approach was evaluated with an actual case of domain knowledge usage in the context of epizootic disease outbreak, being avian influenza, and running examples with COVID-19 ontologies. CONCLUSIONS: The evaluation demonstrated the potential and feasibility of a conflict resolution framework for ontologies.


Asunto(s)
Ontologías Biológicas/estadística & datos numéricos , Biología Computacional/estadística & datos numéricos , Almacenamiento y Recuperación de la Información/estadística & datos numéricos , Web Semántica , Semántica , Vocabulario Controlado , COVID-19/epidemiología , COVID-19/prevención & control , COVID-19/virología , Biología Computacional/métodos , Bases de Datos Factuales/estadística & datos numéricos , Epidemias/prevención & control , Humanos , Almacenamiento y Recuperación de la Información/métodos , Lógica , SARS-CoV-2/fisiología
10.
J Biomed Semantics ; 12(1): 13, 2021 07 18.
Artículo en Inglés | MEDLINE | ID: mdl-34275487

RESUMEN

BACKGROUND: Effective response to public health emergencies, such as we are now experiencing with COVID-19, requires data sharing across multiple disciplines and data systems. Ontologies offer a powerful data sharing tool, and this holds especially for those ontologies built on the design principles of the Open Biomedical Ontologies Foundry. These principles are exemplified by the Infectious Disease Ontology (IDO), a suite of interoperable ontology modules aiming to provide coverage of all aspects of the infectious disease domain. At its center is IDO Core, a disease- and pathogen-neutral ontology covering just those types of entities and relations that are relevant to infectious diseases generally. IDO Core is extended by disease and pathogen-specific ontology modules. RESULTS: To assist the integration and analysis of COVID-19 data, and viral infectious disease data more generally, we have recently developed three new IDO extensions: IDO Virus (VIDO); the Coronavirus Infectious Disease Ontology (CIDO); and an extension of CIDO focusing on COVID-19 (IDO-COVID-19). Reflecting the fact that viruses lack cellular parts, we have introduced into IDO Core the term acellular structure to cover viruses and other acellular entities studied by virologists. We now distinguish between infectious agents - organisms with an infectious disposition - and infectious structures - acellular structures with an infectious disposition. This in turn has led to various updates and refinements of IDO Core's content. We believe that our work on VIDO, CIDO, and IDO-COVID-19 can serve as a model for yielding greater conformance with ontology building best practices. CONCLUSIONS: IDO provides a simple recipe for building new pathogen-specific ontologies in a way that allows data about novel diseases to be easily compared, along multiple dimensions, with data represented by existing disease ontologies. The IDO strategy, moreover, supports ontology coordination, providing a powerful method of data integration and sharing that allows physicians, researchers, and public health organizations to respond rapidly and efficiently to current and future public health crises.


Asunto(s)
Ontologías Biológicas/estadística & datos numéricos , COVID-19/prevención & control , Control de Enfermedades Transmisibles/estadística & datos numéricos , Enfermedades Transmisibles/terapia , Biología Computacional/estadística & datos numéricos , SARS-CoV-2/aislamiento & purificación , COVID-19/epidemiología , COVID-19/virología , Control de Enfermedades Transmisibles/métodos , Enfermedades Transmisibles/epidemiología , Enfermedades Transmisibles/transmisión , Biología Computacional/métodos , Minería de Datos/métodos , Minería de Datos/estadística & datos numéricos , Epidemias , Humanos , Difusión de la Información/métodos , Salud Pública/métodos , Salud Pública/estadística & datos numéricos , SARS-CoV-2/fisiología , Semántica
11.
Sci Rep ; 11(1): 14125, 2021 07 08.
Artículo en Inglés | MEDLINE | ID: mdl-34239004

RESUMEN

miRNAs (or microRNAs) are small, endogenous, and noncoding RNAs construct of about 22 nucleotides. Cumulative evidence from biological experiments shows that miRNAs play a fundamental and important role in various biological processes. Therefore, the classification of miRNA is a critical problem in computational biology. Due to the short length of mature miRNAs, many researchers are working on precursor miRNAs (pre-miRNAs) with longer sequences and more structural features. Pre-miRNAs can be divided into two groups as mirtrons and canonical miRNAs in terms of biogenesis differences. Compared to mirtrons, canonical miRNAs are more conserved and easier to be identified. Many existing pre-miRNA classification methods rely on manual feature extraction. Moreover, these methods focus on either sequential structure or spatial structure of pre-miRNAs. To overcome the limitations of previous models, we propose a nucleotide-level hybrid deep learning method based on a CNN and LSTM network together. The prediction resulted in 0.943 (%95 CI ± 0.014) accuracy, 0.935 (%95 CI ± 0.016) sensitivity, 0.948 (%95 CI ± 0.029) specificity, 0.925 (%95 CI ± 0.016) F1 Score and 0.880 (%95 CI ± 0.028) Matthews Correlation Coefficient. When compared to the closest results, our proposed method revealed the best results for Acc., F1 Score, MCC. These were 2.51%, 1.00%, and 2.43% higher than the closest ones, respectively. The mean of sensitivity ranked first like Linear Discriminant Analysis. The results indicate that the hybrid CNN and LSTM networks can be employed to achieve better performance for pre-miRNA classification. In future work, we study on investigation of new classification models that deliver better performance in terms of all the evaluation criteria.


Asunto(s)
Biología Computacional/estadística & datos numéricos , Aprendizaje Profundo/estadística & datos numéricos , Aprendizaje Automático/estadística & datos numéricos , MicroARNs/clasificación , Algoritmos , Humanos , MicroARNs/genética , Redes Neurales de la Computación
12.
J Comput Aided Mol Des ; 35(7): 803-811, 2021 07.
Artículo en Inglés | MEDLINE | ID: mdl-34244905

RESUMEN

Within the scope of SAMPL7 challenge for predicting physical properties, the Integral Equation Formalism of the Miertus-Scrocco-Tomasi (IEFPCM/MST) continuum solvation model has been used for the blind prediction of n-octanol/water partition coefficients and acidity constants of a set of 22 and 20 sulfonamide-containing compounds, respectively. The log P and pKa were computed using the B3LPYP/6-31G(d) parametrized version of the IEFPCM/MST model. The performance of our method for partition coefficients yielded a root-mean square error of 1.03 (log P units), placing this method among the most accurate theoretical approaches in the comparison with both globally (rank 8th) and physical (rank 2nd) methods. On the other hand, the deviation between predicted and experimental pKa values was 1.32 log units, obtaining the second best-ranked submission. Though this highlights the reliability of the IEFPCM/MST model for predicting the partitioning and the acid dissociation constant of drug-like compounds compound, the results are discussed to identify potential weaknesses and improve the performance of the method.


Asunto(s)
Biología Computacional/estadística & datos numéricos , Dipéptidos/química , Programas Informáticos/estadística & datos numéricos , Sulfonamidas/química , Simulación por Computador/estadística & datos numéricos , Humanos , Ligandos , Modelos Estadísticos , Octanoles/química , Teoría Cuántica , Solubilidad , Sulfonamidas/uso terapéutico , Termodinámica , Agua/química
13.
PLoS Comput Biol ; 17(7): e1009244, 2021 07.
Artículo en Inglés | MEDLINE | ID: mdl-34283824

RESUMEN

The large amount of biological data available in the current times, makes it necessary to use tools and applications based on sophisticated and efficient algorithms, developed in the area of bioinformatics. Further, access to high performance computing resources is necessary, to achieve results in reasonable time. To speed up applications and utilize available compute resources as efficient as possible, software developers make use of parallelization mechanisms, like multithreading. Many of the available tools in bioinformatics offer multithreading capabilities, but more compute power is not always helpful. In this study we investigated the behavior of well-known applications in bioinformatics, regarding their performance in the terms of scaling, different virtual environments and different datasets with our benchmarking tool suite BOOTABLE. The tool suite includes the tools BBMap, Bowtie2, BWA, Velvet, IDBA, SPAdes, Clustal Omega, MAFFT, SINA and GROMACS. In addition we added an application using the machine learning framework TensorFlow. Machine learning is not directly part of bioinformatics but applied to many biological problems, especially in the context of medical images (X-ray photographs). The mentioned tools have been analyzed in two different virtual environments, a virtual machine environment based on the OpenStack cloud software and in a Docker environment. The gained performance values were compared to a bare-metal setup and among each other. The study reveals, that the used virtual environments produce an overhead in the range of seven to twenty-five percent compared to the bare-metal environment. The scaling measurements showed, that some of the analyzed tools do not benefit from using larger amounts of computing resources, whereas others showed an almost linear scaling behavior. The findings of this study have been generalized as far as possible and should help users to find the best amount of resources for their analysis. Further, the results provide valuable information for resource providers to handle their resources as efficiently as possible and raise the user community's awareness of the efficient usage of computing resources.


Asunto(s)
Biología Computacional/métodos , Algoritmos , Benchmarking , Nube Computacional , Biología Computacional/normas , Biología Computacional/estadística & datos numéricos , Computadores , Metodologías Computacionales , Interpretación Estadística de Datos , Bases de Datos Factuales/estadística & datos numéricos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Interpretación de Imagen Asistida por Computador , Aprendizaje Automático , Alineación de Secuencia , Programas Informáticos , Interfaz Usuario-Computador
14.
Medicine (Baltimore) ; 100(24): e26271, 2021 Jun 18.
Artículo en Inglés | MEDLINE | ID: mdl-34128858

RESUMEN

BACKGROUND: Thymic epithelial tumors (TETs), originating from the thymic epithelial cells, are the most common primary neoplasms of the anterior mediastinum. Emerging evidence demonstrated that the competing endogenous RNAs (ceRNAs) exerted a crucial effect on tumor development. Hence, it is urgent to understand the regulatory mechanism of ceRNAs in TETs and its impact on tumor prognosis. METHODS: TETs datasets were harvested from the UCSC Xena as the training cohort, followed by differentially expressed mRNAs (DEmRNAs), lncRNAs (DElncRNAs), and miRNAs (DEmiRNAs) at different pathologic type (A, AB, B, and TC) identified via DESeq2 package. clusterProfiler package was utilized to carry out gene ontology and Kyoto encyclopedia of genes and genomes functional analysis on the DEmRNAs. Subsequently, the lncRNA-miRNA-mRNA regulatory network was constructed to screen the key DEmRNAs. After the key DEmRNAs were verified in the external cohort from Gene Expression Omnibus database, their associated-ceRNAs modules were used to perform the K-M and Cox regression analysis to build a prognostic significance for TETs. Lastly, the feasibility of the prognostic significance was validated by receiver operating characteristic (ROC) curves and the area under the curve. RESULTS: Finally, a total of 463 DEmRNAs, 87 DElncRNAs, and 20 DEmiRNAs were obtained from the intersection of differentially expressed genes in different pathological types of TETs. Functional enrichment analysis showed that the DEmRNAs were closely related to cell proliferation and tumor development. After lncRNA-miRNA-mRNA network construction and external cohort validation, a total of 4 DEmRNAs DOCK11, MCAM, MYO10, and WASF3 were identified and their associated-ceRNA modules were significantly associated with prognosis, which contained 3 lncRNAs (lncRNA LINC00665, lncRNA NR2F1-AS1, and lncRNA RP11-285A1.1), 4 mRNAs (DOCK11, MCAM, MYO10, and WASF3), and 4 miRNAs (hsa-mir-143, hsa-mir-141, hsa-mir-140, and hsa-mir-3199). Meanwhile, ROC curves verified the accuracy of prediction ability of the screened ceRNA modules for prognosis of TETs. CONCLUSION: Our study revealed that ceRNAs modules might exert a crucial role in the progression of TETs. The mRNA associated-ceRNA modules could effectively predict the prognosis of TETs, which might be the potential prognostic and therapeutic markers for TETs patients.


Asunto(s)
Biología Computacional/estadística & datos numéricos , MicroARNs/análisis , Neoplasias Glandulares y Epiteliales/genética , ARN Largo no Codificante/análisis , ARN Mensajero/análisis , Neoplasias del Timo/genética , Biomarcadores de Tumor/genética , Estudios de Cohortes , Biología Computacional/métodos , Conjuntos de Datos como Asunto , Progresión de la Enfermedad , Regulación Neoplásica de la Expresión Génica/genética , Ontología de Genes , Redes Reguladoras de Genes , Humanos , Valor Predictivo de las Pruebas , Pronóstico , Modelos de Riesgos Proporcionales , Curva ROC
15.
J Comput Aided Mol Des ; 35(7): 771-802, 2021 07.
Artículo en Inglés | MEDLINE | ID: mdl-34169394

RESUMEN

The Statistical Assessment of Modeling of Proteins and Ligands (SAMPL) challenges focuses the computational modeling community on areas in need of improvement for rational drug design. The SAMPL7 physical property challenge dealt with prediction of octanol-water partition coefficients and pKa for 22 compounds. The dataset was composed of a series of N-acylsulfonamides and related bioisosteres. 17 research groups participated in the log P challenge, submitting 33 blind submissions total. For the pKa challenge, 7 different groups participated, submitting 9 blind submissions in total. Overall, the accuracy of octanol-water log P predictions in the SAMPL7 challenge was lower than octanol-water log P predictions in SAMPL6, likely due to a more diverse dataset. Compared to the SAMPL6 pKa challenge, accuracy remains unchanged in SAMPL7. Interestingly, here, though macroscopic pKa values were often predicted with reasonable accuracy, there was dramatically more disagreement among participants as to which microscopic transitions produced these values (with methods often disagreeing even as to the sign of the free energy change associated with certain transitions), indicating far more work needs to be done on pKa prediction methods.


Asunto(s)
Biología Computacional/estadística & datos numéricos , Simulación por Computador/estadística & datos numéricos , Programas Informáticos/estadística & datos numéricos , Sulfonamidas/química , Diseño de Fármacos/estadística & datos numéricos , Entropía , Humanos , Ligandos , Modelos Químicos , Modelos Estadísticos , Octanoles/química , Teoría Cuántica , Solubilidad , Solventes/química , Sulfonamidas/uso terapéutico , Termodinámica , Agua/química
16.
Comput Math Methods Med ; 2021: 5588385, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34055039

RESUMEN

For the low optimization accuracy of the cuckoo search algorithm, a new search algorithm, the Elite Hybrid Binary Cuckoo Search (EHBCS) algorithm, is improved by feature weighting and elite strategy. The EHBCS algorithm has been designed for feature selection on a series of binary classification datasets, including low-dimensional and high-dimensional samples by SVM classifier. The experimental results show that the EHBCS algorithm achieves better classification performances compared with binary genetic algorithm and binary particle swarm optimization algorithm. Besides, we explain its superiority in terms of standard deviation, sensitivity, specificity, precision, and F-measure.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Máquina de Vectores de Soporte , Animales , Aves , Clasificación , Biología Computacional/estadística & datos numéricos , Bases de Datos Factuales/estadística & datos numéricos , Femenino , Humanos , Masculino , Reconocimiento de Normas Patrones Automatizadas/métodos , Reconocimiento de Normas Patrones Automatizadas/estadística & datos numéricos , Factores de Riesgo
17.
Methods Mol Biol ; 2284: 147-179, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33835442

RESUMEN

The main purpose of pathway or gene set analysis methods is to provide mechanistic insight into the large amount of data produced in high-throughput studies. These tools were developed for gene expression analyses, but they have been rapidly adopted by other high-throughput techniques, becoming one of the foremost tools of omics research.Currently, according to different biological questions and data, we can choose among a vast plethora of methods and databases. Here we use two published examples of RNAseq datasets to approach multiple analyses of gene sets, networks and pathways using freely available and frequently updated software. Finally, we conclude this chapter by presenting a survival pathway analysis of a multiomics dataset. During this overview of different methods, we focus on visualization, which is a fundamental but challenging step in this computational field.


Asunto(s)
Biología Computacional/métodos , Conjuntos de Datos como Asunto/estadística & datos numéricos , RNA-Seq/estadística & datos numéricos , Animales , Biología Computacional/estadística & datos numéricos , Interpretación Estadística de Datos , Bases de Datos Genéticas/estadística & datos numéricos , Perfilación de la Expresión Génica/métodos , Perfilación de la Expresión Génica/estadística & datos numéricos , Redes Reguladoras de Genes , Humanos , Redes y Vías Metabólicas/genética , RNA-Seq/métodos , Programas Informáticos , Integración de Sistemas , Transcriptoma , Secuenciación del Exoma/métodos , Secuenciación del Exoma/estadística & datos numéricos
18.
Brief Bioinform ; 22(5)2021 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-33754625

RESUMEN

Last two decades, the studies on microRNAs (miRNAs) and the numbers of annotated miRNAs in plants and animals have surged. Herein, we reviewed the current progress and challenges of miRNA annotation in plants. Via the comparison of plant and animal miRNAs, we pinpointed out the difficulties on plant miRNA annotation and proposed potential solutions. In terms of recalling the history of methods and criteria in plant miRNA annotation, we detailed how the major progresses made and evolved. By collecting and categorizing bioinformatics tools for plant miRNA annotation, we surveyed their advantages and disadvantages, especially for ones with the principle of mimicking the miRNA biogenesis pathway by parsing deeply sequenced small RNA (sRNA) libraries. In addition, we summarized all available databases hosting plant miRNAs, and posted the potential optimization solutions such as how to increase the signal-to-noise ratio (SNR) in these databases. Finally, we discussed the challenges and perspectives of plant miRNA annotations, and indicated the possibilities offered by an all-in-one tool and platform according to the integration of artificial intelligence.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , MicroARNs/genética , Plantas/genética , ARN de Planta/genética , Inteligencia Artificial , Biología Computacional/estadística & datos numéricos , Perfilación de la Expresión Génica/métodos , Regulación de la Expresión Génica de las Plantas , Redes Reguladoras de Genes/genética , Anotación de Secuencia Molecular/métodos , Plantas/clasificación
19.
Sci Rep ; 11(1): 5146, 2021 03 04.
Artículo en Inglés | MEDLINE | ID: mdl-33664338

RESUMEN

Multi-modal molecular profiling data in bulk tumors or single cells are accumulating at a fast pace. There is a great need for developing statistical and computational methods to reveal molecular structures in complex data types toward biological discoveries. Here, we introduce Nebula, a novel Bayesian integrative clustering analysis for high dimensional multi-modal molecular data to identify directly interpretable clusters and associated biomarkers in a unified and biologically plausible framework. To facilitate computational efficiency, a variational Bayes approach is developed to approximate the joint posterior distribution to achieve model inference in high-dimensional settings. We describe a pan-cancer data analysis of genomic, epigenomic, and transcriptomic alterations in close to 9000 tumor samples across canonical oncogenic signaling pathways, immune and stemness phenotype, with comparisons to state-of-the-art clustering methods. We demonstrate that Nebula has the unique advantage of revealing patterns on the basis of shared pathway alterations, offering biological and clinical insights beyond tumor type and histology in the pan-cancer analysis setting. We also illustrate the utility of Nebula in single cell data for immune cell decomposition in peripheral blood samples.


Asunto(s)
Carcinogénesis/genética , Biología Computacional/estadística & datos numéricos , Genómica/estadística & datos numéricos , Neoplasias/genética , Teorema de Bayes , Análisis por Conglomerados , Epigenómica , Humanos , Modelos Estadísticos , Neoplasias/patología , Transcriptoma/genética
20.
Proteins ; 89(6): 697-707, 2021 06.
Artículo en Inglés | MEDLINE | ID: mdl-33538038

RESUMEN

Deep learning has emerged as a revolutionary technology for protein residue-residue contact prediction since the 2012 CASP10 competition. Considerable advancements in the predictive power of the deep learning-based contact predictions have been achieved since then. However, little effort has been put into interpreting the black-box deep learning methods. Algorithms that can interpret the relationship between predicted contact maps and the internal mechanism of the deep learning architectures are needed to explore the essential components of contact inference and improve their explainability. In this study, we present an attention-based convolutional neural network for protein contact prediction, which consists of two attention mechanism-based modules: sequence attention and regional attention. Our benchmark results on the CASP13 free-modeling targets demonstrate that the two attention modules added on top of existing typical deep learning models exhibit a complementary effect that contributes to prediction improvements. More importantly, the inclusion of the attention mechanism provides interpretable patterns that contain useful insights into the key fold-determining residues in proteins. We expect the attention-based model can provide a reliable and practically interpretable technique that helps break the current bottlenecks in explaining deep neural networks for contact prediction. The source code of our method is available at https://github.com/jianlin-cheng/InterpretContactMap.


Asunto(s)
Biología Computacional/estadística & datos numéricos , Aprendizaje Profundo , Proteínas/química , Programas Informáticos , Benchmarking , Sitios de Unión , Bases de Datos de Proteínas , Humanos , Unión Proteica , Conformación Proteica , Dominios y Motivos de Interacción de Proteínas , Proteínas/metabolismo , Proyectos de Investigación , Alineación de Secuencia , Análisis de Secuencia de Proteína
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA